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(54) Title: NUCLEIC ACID BINDING PROTEINS 



(57) Abstract / 

The inventionprovides a method 
for preparing a nucleic acid binding 
protein of the Cys2-His2 zinc finger 
class capable of binding to a nucleic 
acid quadruplet in a target nucleic acid 
sequence, wherein binding to base 4 
of the quadruplet by an a -helical zinc 
finger nucleic acid binding motif in 
the protein is determined as follows: 
if base 4 in the quadruplet is A, then 
position 46 in the a -helix is Gin and 
position ++2 is not Asp; and if base 4 
in the quadruplet is C, then position +6 
in the a-helix may be any residue, as 
long as position ++2 in the ar-helix is 
not Asp. 




WO 98/53060 



PCT/GB98/01516 



Nucleic Ari fl ttfaT^np Proteins 

The present invention relates to nucleic acid binding proteins. In particular, the invention 
relates to a method for designing a protein which is capable of binding to any predefined 
nucleic acid sequence. 

Protein-nucleic acid recognition is a commonplace phenomenon which is central to a large 
number of biomolecular control mechanisms which regulate the functioning of eukaryotic 
and prokaryotic cells. For instance, protein-DNA interactions form the basis of the 
resulation of gene expression and are thus one of the subjects most widely studied by 
molecular biologists. 

A wealth of biochemical and structural information explains the details of protein-DNA 
recognition in numerous instances, to the extent that general principles of recognition have 
5 emerged. Many DNA-binding proteins contain independently folded domains for the 
recognition of DNA, and these domains in turn belong to a large number of structural 
families, such as the leucine zipper, the "helix-turn-helix" and zinc finger families. 

Despite the great variety of structural domains, the specificity of the interactions observed 
20 to date between protein and DNA most often derives from the complementarity of the 
surfaces of a protein a-helix and the major groove of DNA [Klug, (1993) Gene 135:83-92]. 
In light of the recurring physical interaction of a-helix and major groove, the tantalising 
possibility arises that the contacts between particular amino acids and DNA bases could be 
described by a simple set of rules; in effect a stereochemical recognition code which relates 
25 protein primary structure to binding-site sequence preference. 

It is clear, however, that no code will be found which can describe DNA recognition by all 
DNA-binding proteins. The structures of numerous complexes show significant differences 
in the way that the recognition a-helices of DNA-binding proteins from different structural 
"30 families interact with the major groove of DNA ? thus precluding similarities in patterns of 
recosnition. The majority of known DNA-binding motifs are not particularly versatile, and 
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elucidated for the interactions of classical zinc fingers with nucleic acid. In this case a 
pattern of rules is provided which covers binding to all nucleic acid sequences. 

According to a first aspect of the present invention, therefore, we provide a method for 
5 preparing a nucleic acid binding protein of the Cys2-His2 zinc finger class capable of 
binding to a nucleic acid quadruplet in a target nucleic acid sequence, wherein binding to 
base 4 of the quadruplet by an ct-helical zinc finger nucleic acid binding motif in the protein 
is determined as follows: 

10 a) if base 4 in the quadruplet is A. then position +6 in the a-helix is Gin and ++2 is not 

Asp; 

b) if base 4 in the quadruplet is C, then position +6 in the a-helix may be any residue, as 
long as position -1-4-2 in the a-helix is not Asp. 

15 Preferably, binding to base 4 of the quadruplet by an a-helical zinc finger nucleic acid 
binding motif in the protein is additionally determined as follows: 

c) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or position 4-6 
is Ser or Thr and position 4- +2 is Asp; 

20 d) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr and 
position -i- 4-2 is Asp. 

The quadruplets specified in the present invention are overlapping, such that, when read 3* 
to 5' on the -strand of the nucleic acid, base 4 of the first quadruplet is base 1 of the 
25 second, and so on. Accordingly, in the present application, the bases of each quadruplet 
are referred by number, from 1 to 4, 1 being the 3* base and 4 being the 5' base. 

All of the nucleic acid-binding residue positions of zinc fingers, as referred to herein, are 
numbered from the first residue in the a-helix of the finger, ranging from +1 ro 4-9. 
30 M" refers to the residue in the framework structure immediately preceding the a-helix in 
a Cys2-His2 zinc finger polypeptide. 
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acid, since it is this strand which is aligned 3' to 5\ These conventions are followed in the 
nomenclature used herein. It should be noted, however, that in nature certain fingers, such 
as finger 4 of the protein GLI, bind to the 4- strand of nucleic acid: see Suzuki et al. t 
(1994) NAR 22:3397-3405 and Pavletich and Pabo, (1993) Science 261:1701-1707. The 
incorporation of such fingers into nucleic acid binding molecules according to the invention 
is envisaged. 

The invention provides a solution to a problem hitherto unaddressed in the art, by 
permitting the rational design of polypeptides which will bind nucleic acid quadruplets 
whose 5' residue is other than G. In particular, the invention provides for the first time a 
solution for the design of polypeptides for binding quadruplets containing 5' A or C. 

Position ~6 in the a-helix is generally responsible for the interaction with the base 4 of a 
given quadruplet in the target. According to the present invention, an A at base 4 interacts 
with a Glutamine (Gin or Q) at position +6, while a C at base 4 will interact with any 
amino acid provided that position + +2 is not Aspartic acid (Asp or D). 

The present invention concerns a method for preparing nucleic acid binding proteins which 
are capable of binding nucleic acid. Thus, whilst the solutions provided by the invention 
will result in a functional nucleic acid binding molecule, it is possible that naturally- 
occurring zinc finger nucleic acid binding molecules may not follow some or all of the 
rules provided herein. This does not matter, because the aim of the invention is to permit 
the design of the nucleic acid binding molecules on the basis of nucleic acid sequence, and 
not the converse. This is why the rules, in certain instances, provide for a number of 
possibilities for any given residue. In other instances, alternative residues to those given 
may be possible. The present invention, thus, does not seek to provide every solution for 
the design of a binding protein for a given target nucleic acid. It does, however, provide 
for the first time a complete solution allowing a functional nucleic acid binding protein to 
be constructed for any given nucleic acid quadruplet. 
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position +2 in the helix is responsbile for determining the binding to base 1 of the 
quadruplet. In doing so, it cooperates synergistically with position +6, which determines 
binding at base 4 in the quadruplet, bases 1 and 4 being overlapping in adjacent 
quadruplets. 

A zinc finger binding motif is a structure well known to those in the art and defined in, for 
example, Miller et al. t (1985) EMBO -J. 4:1609-1614; Berg (1938) PNAS (USA) 85:99- 
102: Lee et aL, (1989) Science 245:635-637; see International patent applications WO 
96/06166 and WO 96/32475. corresponding to USSN 08/422.107, incorporated herein by 
reference. 

As used herein, "nucleic acid" refers to both RNA and DNA, constructed from natural 
nucleic acid bases or synthetic bases, or mixtures thereof. Preferably, however, the 
binding proteins of the invention are DNA binding proteins. 

In general, a preferred zinc finger framework has the structure: 

(A) X 0 _ 2 C X,. s C X 9 . 14 H X 3 _ 5 H / c 

where X is any amino acid, and the numbers in subscript indicate the possible numbers of 
residues represented by X. 

In a preferred aspect of the present invention, zinc finger nucleic acid binding motifs may 
be represented as motifs having the following primary structure: 

(B) X a C X 2 . 4 C X 2 _ 3 FX c XXXXLXXHXXX b H - linker . 

.1 123456789 

wherein X (including X\ X b and X c ) is any amino acid. X 2 . 4 and X 20 refer to the presence 
of 2 or 4, or 2 or 3. amino acids, respectively. The Cys and His residues, which together 
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Preferably, the linker is T-G-E-K or T-G-E-K-P. 
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As set out above, the major binding interactions occur with amino acids -1.4-2,-1-3 and 
4-6. Amino acids +4 and +7 are largely invariant. The remaining amino acids may be 
5 essentially any amino acids. Preferably, position +9 is occupied by Arg or Lys. 
Advantageously, positions +1,-1-5 and +8 are not hydrophobic amino acids, that is to say 
are not Phe, Trp or Tyr. 

In a most preferred aspect, therefore, bringing together the above, the invention allows the 
10 definition of every residue in a zinc finger nucleic acid binding motif which will bind 
specifically to a given nucleic acid quadruplet. 

The code provided by the present invention is not entirely rigid; certain choices are 
provided. For example, positions 4-1, 4-5 and 4-8 may have any amino acid allocation, 
i5 whilst other positions may have certain options: for example, the present rules provide 
that, for binding to a central T residue, any one of Ala, Ser or Val may be used at 4-3. In 
its broadest sense, therefore, the present invention provides a very large number of proteins 
which are capable of binding to every defined target nucleic acid quadruplet. 

20 Preferably, however, the number of possibilities may be significantly reduced. For 
example, the non-critical residues 4-1, 4-5 and 4-8 may be occupied by the residues Lys, 
Thr and Gin respectively as a default option. In the case of the other choices, for example, 
the first-given option may be employed as a default. Thus, the code according to the 
present invention allows the design of a single, defined polypeptide (a "default" 

25 polypeptide) which will bind to its target quadruplet. 

In a further aspect of the present invention, there is provided a method for preparing a 
nucleic acid binding protein of the Cys2-His2 zinc finger class capable of binding to a 
target nucleic acid sequence, comprising the steps of: 

30 
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residues known to affect binding to bases at which the natural and desired targets differ. 
Otherwise, mutation of the model Fingers should be concentrated upon residues -1. 4-2, +3 
and -i-6 as provided for in the foregoing rules. 

5 In order to produce a binding protein having improved binding, moreover, the rules 
provided by the present invention may be supplemented by physical or virtual modelling of 
the protein/nucleic acid interface in order to assist in residue selection. 

Zinc finger binding motifs designed according to the invention may be combined into 
10 nucleic acid binding proteins having a multiplicity of zinc fingers. Preferably, the proteins 
have at least two zinc fingers. In nature, zinc finger binding proteins commonly have at 
least three zinc fingers, although two-zinc finger proteins such as Tramtrack are known. 
The presence of at least three zinc fingers is preferred. Binding proteins may be 
constructed by joining the required fingers end to end, N-terminus to C-terminus. 
15 Preferably, this is effected by joining together the relevant nucleic acid coding sequences 
encoding the zinc fingers to produce a composite coding sequence encoding the entire 
binding protein. The invention therefore provides a method for producing a nucleic acid 
binding protein as defined above, wherein the nucleic acid binding protein is constructed by 
recombinant DNA technology, the method comprising the steps of: 

20 

a) preparing a nucleic acid coding sequence encoding two or more zinc finger binding 
motifs as defined above, placed N-terminus to C-terminus; 

b) inserting the nucleic acid sequence into a suitable expression vector; and 

c) expressing the nucleic acid sequence in a host organism in order to obtain the nucleic 
25 acid binding protein. 

A "leader" peptide may be added to the N-terminal finger. Preferably, the leader peptide 
is MAEEKP. 



30 



The nucleic acid encoding the nucleic acid binding protein according to the invention can 
be incorporated into vectors for further manipulation. As used herein, vector (or plasmid) 
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can be amplified by PCR and be directly transfected into the host cells without any 
replication component. 

Advantageously, an expression and cloning vector may contain a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
growth of transformed host cells grown in a selective culture medium. Host cells not 
transformed with the vector containing the selection gene will not survive in the culture 
medium. Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillin. neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available from complex media. 

As to a selective gene marker appropriate for yeast, any marker gene can be used which 
facilitates the selection for transformants due to the phenotypic expression of the marker 
gene. Suitable markers for yeast are, for example, those conferring resistance to antibiotics 
G418, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic yeast 
mutant, for example the URA3, LEU2, LYS2, TRP1, or HIS3 gene. 

Since the replication of vectors is conveniently done in£. coli, an£. coli genetic marker 
and an £. coli origin of replication are advantageously included. These can be obtained 
from £. coli plasmids, such as pBR322, Bluescripr 5 vector or a pUC plasmid, e.g. pUCIS 
or pUC19, which contain both £. coli replication origin and £. coli genetic marker 
conferring resistance to antibiotics, such as ampicillin. 

Suitable selectable markers for mammalian cells are those that enable the identification of 
cells competent to take up nucleic acid binding protein nucleic acid, such as dihydrofolate 
reductase (DHFR. methotrexate resistance), thymidine kinase, or genes conferring 
resistance to G418 or hygromycin. The mammalian cell transformants are placed under 
selection pressure which only those transformants which have taken up and are expressing 
the marker are uniquely adapted to survive. In the case of a DHFR or glutamine synthase 
(GS) marker, selection pressure can be imposed by cuituring the transformants under 
conditions in which the pressure is progressively increased, thereby leading to amplification 
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UV5 promoter. This system has been employed successfully for over-production of many 
proteins. Alternatively the polymerase gene may be introduced on a lambda phage by 
infection with an int- phage such as the CE6 phage which is commercially available 
(Novasen, Madison, USA), other vectors include vectors containing the lambda PL 
promoter such as PLEX (Invitrogen, NL) , vectors containing the trc promoters such as 
pTrcHisXpressTm (Invitrogen) or pTrc99 (Pharmacia Biotech. SE) or vectors containing 
the tac promoter such as pKK223-3 (Pharmacia Biotech) or PMAL (New England Biolabs, 
MA, USA). 

Moreover, the nucleic acid binding protein gene according to the invention preferably 
includes a secretion sequence in order to facilitate secretion of the polypeptide from 
bacterial hosts, such that it will be produced as a soluble native peptide rather than in an 
inclusion body. The peptide may be recovered from the bacterial periplasmic space, or the 
culture medium, as appropriate. 

Suitable promoting sequences for use with yeast hosts may be regulated or constitutive and 
are preferably derived from a highly expressed yeast gene, especially a Saccharomyces 
cerevisiae gene. Thus, the promoter of the TRP1 gene, the ADHI or ADHII gene, the acid 
phosphatase (PH05) gene ? a promoter of the yeast mating pheromone genes coding for the 
a- or a-factor or a promoter derived from a gene encoding a glycolytic enzyme such as the 
promoter of the enolase. glyceraldehyde-3-phosphate dehydrogenase (GAP), 3-phospho 
glycerate kinase (PGK), hexokinase, pyruvate decarboxylase, phosphofructokinase, 
glucose-6-phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triose 
phosphate isomerase. phosphoglucose isomerase or glucokinase genes, or a promoter from 
the TATA binding protein (TBP) gene can be used. Furthermore, it is possible to use 
hybrid promoters comprising upstream activation sequences (UAS) of one yeast gene and 
downstream promoter elements including a functional TATA box of another yeast gene, for 
example a hybrid promoter including the UAS(s) of the yeast PH05 gene and downstream 
promoter elements including a functional TATA box of the yeast GAP gene (PHOStGAP 
hybrid promoter). A suitable constitutive PH05 promoter is e.g. a shortened acid 
phosphatase PH05 promoter devoid of the upstream regulator)' elements (UAS) such as the 
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contain nucleotide segments transcribed as polyadenylated fragments in the untranslated 
portion of the mRNA encoding nucleic acid binding protein. 

An expression vector includes any vector capable of expressing nucleic acid binding protein 
5 nucleic acids that are operatively linked with regulatory sequences, such as promoter 

regions, that are capable of expression of such DNAs. Thus, an expression vector refers to 
a recombinant DNA or RNA construct, such as a plasmid, a phage, recombinant virus or 
other vector, that upon introduction into an appropriate host cell, results in expression of 
the cloned DNA. Appropriate expression vectors are well known to those with ordinary 
10 skill in the art and include those that are replicable in eukaryotic and/or prokaryotic cells 
and those that remain episomal or those which integrate into the host cell genome. For 
example, DNAs encoding nucleic acid binding protein may be inserted into a vector 
suitable for expression of cDNAs in mammalian cells, e.g. a CMV enhancer-based vector 
such as pEVRF (Matthias, et aL, (1989) NAR 17, 6418). 

15 

Particularly useful for practising the present invention are expression vectors that provide 
for the transient expression of DNA encoding nucleic acid binding protein in mammalian 
cells. Transient expression usually involves the use of an expression vector that is able to 
replicate efficiently in a host cell, such that the host cell accumulates many copies of the 
20 expression vector, and. in turn, synthesises high levels of nucleic acid binding protein. For 
the purposes of the present invention, transient expression systems are useful e.g. for 
identifying nucleic acid binding protein mutants, to identify potential phosphorylation sites, 
or to characterise functional domains of the protein. 

25 Construction of vectors according to the invention employs conventional ligation 
techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and religated in the 
form desired to generate the plasmids required. If desired, analysis to confirm correct 
sequences in the constructed plasmids is performed in a known fashion. Suitable methods 
for constructing expression vectors, preparing in vitro transcripts, introducing DNA into 

30 host cells, and performing analyses for assessing nucleic acid binding protein expression 
and function are known to those skilled in the art. Gene presence, amplification and/or 
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bindins protein may be empirically determined and optimised for a particular cell and 
assay. 

Host cells are transfected or, preferably, transformed with the above-captioned expression 
or cloning vectors of this invention and cultured in conventional nutrient media modified as 
appropriate for inducing promoters, selecting transformants, or amplifying the genes 
encoding the desired sequences. Heterologous DNA may be introduced into host cells by 
any method known in the art, such as transfection with a vector encoding a heterologous 
DNA by the calcium phosphate coprecipitation technique or by electroporation. Numerous 
me:hods of transfection are known to the skilled worker in the field. Successful transfection 
is senerally recognised when any indication of the operation of this vector occurs in the 
host celL Transformation is achieved using standard techniques appropriate to the particular 
host cells used. 

Incorporation of cloned DNA into a suitable expression vector, transfection of eukaryotic 
cells with a plasmid vector or a combination of plasmid vectors, each encoding one or more 
distinct genes or with linear DNA, and selection of transfected cells are well known in the 
art (see, e.g. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Second 
Edition. Cold Spring Harbor Laboratory Press). 

Transfected or transformed cells are cultured using media and culturing methods known in 
the art, preferably under conditions, whereby the nucleic acid binding protein encoded by 
the DNA is expressed. The composition of suitable media is known to those in the art, so 
that they can be readily prepared. Suitable culturing media are also commercially available. 

In a further aspect, the invention also provides means by which the binding of the protein 
designed according to the rules can be improved by randomising the proteins and selecting 
for improved binding. In this aspect, the present invention represents an improvement of 
the method set forth in WO 96/06166. Thus, zinc finger molecules designed according to 
the invention may be subjected to limited randomisation and subsequent selection, such as 
by phage display, in order to optimise the binding characteristics of the molecule. 
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affinity purification. The phage are then amplified by passage through a bacterial host, and 
subjected to further rounds of selection and amplification in order to enrich the mutant pool 
for the desired phage and eventually isolate the preferred clone(s). Detailed methodology 
for phage display is known in the art and set forth, for example, in US Patent 5,223.409; 
5 Choo and Klug, (1995) Current Opinions in Biotechnology 6:431-436; Smith. (1985) 
Science 228:1315-1317; and McCafferty et ai, (1990) Nature 348:552-554; all 
incorporated herein by reference. Vector systems and kits for phage display are available 
commercially, for example from Pharmacia. 

10 Randomisation of the zinc finger binding motifs produced according to the invention is 
preferably directed to those residues where the code provided herein gives a choice of 
residues. For example, therefore, positions +1, +5 and +8 are advantageously 
randomised, whilst preferably avoiding hydrophobic amino acids; positions involved in 
binding to the nucleic acid, notably -1, +2, +3 and +6, may be randomised also, 

15 preferably within the choices provided by the rules of the present invention. 

Preferably, therefore, the "default" protein produced according to the rules provided by the 
invention can be improved by subjecting the protein to one or more rounds of 
randomisation and selection within the specified parameters. 

20 

nucleic acid binding proteins according to the invention may be employed in a wide variety 
of applications, including diagnostics and as research tools. Advantageously, they may be 
employed as diagnostic tools for identifying the presence of nucleic acid molecules in a 
complex mixture. nucleic acid binding molecules according to the invention can 
25 differentiate single base pair changes in target nucleic acid molecules. 

Accordingly, the invention provides a method for determining the presence of a target 
nucleic acid molecule, comprising the steps of: 



30 



a) preparing a nucleic acid binding protein by the method set forth above which is specific 
for the target nucleic acid molecule: 
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nucleic acid cleaving domain is fused to a nucleic acid binding domain comprising a zinc 
finger as described herein. 

The invention is described below, for the purpose of illustration only, in the following 
5 examples, with reference to the figures, in which: 

Figure 1 illustrates the design of a zinc finger binding protein specific for a G12V mutant 
ras oncogene; 

10 Figure 2 illustrates the binding specificity of the binding protein for the oncogene as 
opposed to the wild-type ras sequence: and 

Figure 3 illustrates the results of an ELISA assay performed using the anti-ras binding 
protein with both wild-type and mutant target nucleic acid sequences; 

15 

Figure 4 illustrates interactions between the Zif268 DNA-binding domain and DNA. (a) 
Schematic diagram of modular recognition between the three zinc fingers of Zif268 and 
triplet subsites of an optimised DNA binding site. Straight arrows indicate the 
stereochemical juxtapostioning of recognition residues with bases of the contacted G-rich 
20 DNA strand. Note that since the N-terminal finger contacts the 3' end of the DNA and the 
C-terminal finger the 5' end, binding to the G-rich strand is said to be antiparallel. (b) 
View of Zif268 finger 3 bound to DNA, showing the possibility of interaction with both 
DNA strands. Co-ordinates from Pavletich & Pabo, (1991) Science 252:809-817. (c) The 
potential hydrogen bonding network between bases on both strands of the DNA and 
25 positions -1 (Arg) and 2 (Asp) of finger 3 (Pavletich & Pabo 1991). (d) Schematic diagram 
of recognition between the three zinc fingers of Zif268 and an optimised DNA binding site 
including 'cross-strand' interactions. Recognition contacts between Asp2 of each finger and 
the parallel DNA strand (shown by curly arrows) mean that each finger binds overlapping, 
4 bp subsites: 
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Example 1 

Construction of a zinc finger protein 

The target selected for the zinc finger nucleic acid binding protein is the activating point 
5 mutation of the human EJ bladder carcinoma ras oncogene, which was the first DNA lesion 
reported to confer transforming properties on a cellular proto-oncogene. Since the original 
discovery, ras gene mutations have been' found to occur at high frequencies in a variety of 
human cancers and are established targets for the diagnosis of oncogenesis at early stages of 
tumour growth. 

10 

The EJ bladder carcinoma mutation is a single nucleotide change in codon 12 of R-ras. 
which results in a mutation from GGC to GTC at this position. A zinc finger peptide is 
designed to bind a lObp DNA site assigned in the noncoding strand of the mutant ras gene, 
such that three fingers contact 'anticodons' 10, 11 and 12 in series, as shown in Fig. 1, 
15 plus the 5' preceding G (on the -hstrand of the DNA). The rationale of this assignment 
takes into account the fact that zinc fingers make most contacts to one DNA strand, and the 
mutant noncoding strand carries an adenine which can be strongly discriminated from the 
cytosine present in the wild-type ras, by a bidentate contact from an asparagine residue. 

20 The first finger of the designer lead peptide is designed according to the rules set forth 
herein starting from a Zif268 finger 2 model to bind the quadruplet 5'-GCCG-3\ which 
corresponds to 'anticodon' 10 of the designated binding site plus one 3' base. The finger 
has the following sequence: 

25 FQCRI CMRNFSDRS SLTRHTRTHTGEKP 

-1123456789 

A DNA coding sequence encoding this polypeptide is constructed from synthesised 
oligonucleotides. 

30 
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According to the recognition rules, the first finger of the lead peptide could contact 
cvtosine using one of Asp, Glu, Ser or Thr in the third a-helix position. To determine the 
optimal contact, the codon for helical position 3 of finger 1 is engineered by cassette 
mutagenesis to have position 1= A/G, position 2=A/C/G and position 3=C/G. Therefore 
in addition to Asp, Glu, Ser and Thr, the randomisation also specifies Ala. Arg, Asn. Gly 
and Lys. Selections from this mini-library are over one round of phage binding to 5nM 
mutant DNA oligo in 100 PBS containing 50uM ZnCb, 2% (w/v) fat-free dried milk 
(Mar%-el) and 1% (v/v) Tween-20, with lug poly dldC as competitor, followed by six 
) washes with PBS containing 50uM ZnCb and 1 % (v/v) Tween-20. Bound phage are eluted 
with 0.1M triethylamine for 3 mins. and immediately transferred to an equal volume of 1M 
Tris-Cl pH 7.4. 

A single round of randomisation and selection is found to be sufficient to improve the 
5 ' affinity of the lead zinc finger peptide to this standard. A small library of mutants is 
constructed with limited variations specifically in the third a-helical position (+3) of finger 
1 of the designed peptide. Selection from this library yields an optimised DNA-binding 
domain with asparagine at the variable position, which is able to bind the mutant ras 
sequence with an apparent K d of 3nM. i.e. equal to that of the wild-type Zif26S DNA- 
10 binding domain (Fig. 2). The selection of asparagine at this position to bind opposite a 
cvtosine is an unexpected deviation from the recognition rules, which normally pair 
asparagine with adenine. 

The selection of asparagine is, however, consistent with physical considerations of the 
25 protein-DNA interface. In addition to the classical bidentate interaction of asparagine and 
adenine observed in zinc fmger-DNA complexes, asparagine has been observed to bridge a 
base-pair step in the major groove of DNA, for example in the co-crystal structures of the 
GCN4 DNA-binding domain. A number of different base-pair steps provide the correct 
stereochemical pairings of hydrogen bond donors and acceptors which could satisfy 
30 asparagine. including the underlined step GC£ of ras 'anticodon* 10. Although asparagine 
in position 3 of the zinc finger helix would not normally be positioned to bridge a base-pair 
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removed by washing the beads 6 times with PBS containing 50|.tM ZnCb and 1% (v/v) 
Tween-20. The beads are subsequently incubated for lh at RT with anti-M13 IgG 
conjugated to horseradish peroxidase (Pharmacia Biotech) diluted 1:5000 in PBS containing 
50|iM ZnCh and 2% (w/v) fat-free dried milk (Marvel). Excess antibody is removed by 
washing 6 times with PBS containing 5tyiM ZnCb and 0.05% (v/v) Tween, and 3 times 
with PBS containing 50uM ZnCl?. The ELISA is developed with O.lmg/ml 
tetramethylbenzidine (Sigma) in 0.1M sodium acetate pH5.4 containing 2ul of fresh 30% 
hydrogen peroxide per 10ml buffer, and after approximately 1 min, stopped with an equal 
volume of 2M H2SO4. The reaction produces a yellow colour which is quantitated by 
subtracting the absorbance at 650nm from the absorbance at 450nm. It should be noted that 
in this protocol the ELISA is not made competitive, however, soluble tnon biotinylated) 
wild-type ras DNA could be included in the binding reactions, possibly leading to higher 
discrimination between wild-type and mutant ras. 

Phase are retained specifically by DNA bearing the mutant, but not the wild-type ras 
sequence, allowing the detection of the point mutation by ELISA (Fig. 3). 

Example 4 

Design of an anri-HIV zinc finger 

The sequence of the HIV TAR, the region of the LTR which is responsible for trans- 
activation by Tat. is known (Jones and Peterlin, (1994) Ann. Rev. Biochem. 63:717-743). 
A sequence with the TAT region is identified and a zinc finger polypeptide designed to 
bind thereto. 

The selected sequence is 5' - AGA GAG CTC - 3\ which is the complement of nucleotides 
+34 to +42 of HIV. The corresponding amino acids required in fingers 1, 2 and 3 of a 
zinc finger binding protein are determined according to the rules set forth above, as 
follows: 
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vector. Electrocompetent TGI cells are transformed with the recombinant vector. Single 
colonies of tranformants are grown overnight in 2xTY containing 50uM ZnCb 15ug/ml 
tetracycline. Single stranded DNA is prepared from phage in the culture supernatant and 
sequenced with Sequenase 2.0 (United States Biochemical). 

The polypeptide designed according to the invention is then tested for binding to HIV DNA 
and positive results are obtained. 

Example 5 

Alanine mutagenesis of the Asp2 in finger 3 is carried out on the wild-type Zif268 DNA- 
binding domain and four related peptides isolated from the phage display library as follows 
(see also Fig. 5): 

15 E. coli TGI cells are tranfected with fd phage displaying zinc fingers. Colony PCR is 
performed with one primer containing a single mismatch to create the Asp to Ala change in 
finser 3. Cloning of PCR product in phage vector is as described previously (Choo, Y. & 
Klug, A. (1994) Proc. Natl. Acad. Sci. USA 91, 11163-11167; Choo, Y. & Klug, A. 
(1994) Proc. Natl. Acad. Sci. USA 91, 11168-11172). Briefly, forward and backward 

20 PCR primers contained unique restriction sites for Not lorSfil respectively and amplified 
an approximately 300 base pair region encompassing three zinc fingers. PCR products are 
digested with Sfi I and Not I to create cohesive ends and are ligated to lOOng of similarly 
digested fd-Tet-SN vector. Electrocompetent TGI cells are transformed with the 
recombinant vector. Single colonies of tranformants are grown overnight in 2xTY 

25 containing 50uM ZnCl 2 15ug/ml tetracycline. Single stranded DNA is prepared from 
phage in the culture supernatant and sequenced with Sequenase 2.0 (United States 
Biochemical). 

The peptides are chosen for this experiment on the basis of the identity of the residue at 
30 position 6 of the middle finger. Peptide F2-Aig, which contains Arg at position 6 of finger 
2. is chosen since it should specify 5'-G in the 'middle' cognate triplet regardless of the 
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2. As would be expected, according to the hypothesis set out in the introduction, the 
mutation affects binding at the 5* position, while the specificity at the middle and 3' 
position remains unchanged. 

The mutation generally leads to a broadening of specificity, for instance in Zif268 where 
removal of Asp2 in finger 3 results in a protein which is unable to discriminate the 5' base 
of the middle triplet (Fig. 6a). However, the expectation that a new 5' base-specificity for 
the mutants might correlate to the identity of position 6 in finger 2, is not borne out. For 
example F2-Gly would be expected to lose sequence discrimination but, although specificity 
is adversely affected, a slight preference for T is discernible (Fig. 6b). Similarly. F2-Val 
and F2-Asn which might have been expected to acquire specificity for one nucleotide, 
instead have their specificities altered by the mutation (Fig. 6c, d) - the F2-Val mutant 
allows G, A and T but not C, and the F2-Asn mutant appears to discriminate against both 
pyrimidines. In the absence of a larger database it is not possible to deduce whether these 
apparent specificities are the result of amino acid-base contacts from position 6 of finger 2, 
and if so whether these are general interactions which should be regarded as recognition 
rules. The apparent discrimination of F2-Gly in particular, suggests that this is unlikely to 
be the case, but rather that in these particular examples, other mechanisms are involved in 
determining sequence bias. 

In contrast to the loss of discrimination seen for the other four peptides. F2-Arg continues 
to specify guanine in the 5' position of the middle triplet regardless of the mutation in 
finger 3 (fig 3e). In this case, the specificity is derived from the strong interaction between 
guanine and Arg6 in finger 2. This contact has been observed a number of times in zinc 
finger co-crystal structures (Pavletich, N. P. & Pabo, C. O. (1993) Science 261, 1701- 
1707; FairalU L. t Schwabe, J. W. R., Chapman, L., Finch, J. T. & Rhodes, D. (1993) 
Nature (London) 366, 483-487; Fairall, L., Schwabe, J. W. R., Chapman, L., Finch, J. T. 
& Rhodes. D. (1993) Nature (London) 366, 483-487; Kim, C. & Berg, J. M. (1996) 
Nature Str. Biol. 3, 940-945) and is the only recognition rule which relates amino acid 
identity at position 6 to a nucleotide preference at the 5 f position of a cognate triplet (Choo, 
Y.& Klug. A. (1997) Curr. Opin. Str. Biol. 7, 117-125). This interaction is compatible 
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To determine the contribution of Asp2 in finger 3 to the binding strength, apparent 
equilibrium dissociation constants are determined for Zif268 and F2-Arg before and after 
the Ala mutation (Fig. 7). Procedures are as described previously (Choo and Klug, 1994). 
Briefly, appropriate concentrations of S'-biotinylated DNA binding sites are added to equal 
volumes of phage solution described above. Binding is allowed to proceed for one hour at 
20°C. DNA is captured with streptavidin-coated paramagnetic beads (500u.g/\vell). The 
beads are washed 6 times with PBS/Zn containing 1% Tween, then 3 times with PBS/Zn. 
Bound phage are detected by ELISA with horseradish peroxidase-conjugaced anti-Ml3 IgG 
(Pharmacia Biotech) and quantitated using SOFTMAX 2.32 (Molecular Devices). Binding 
data are plotted and analysed using Kaleidagraph (Abelbeck Software). 



Both mutants show approximately a four-fold reduction in affinity for their respective 
binding sites under the conditions used. The reduction is likely a direct result of abolishing 
contacts from Asp2, rather than a consequence of changes in binding specificity at thes- 
is position of the middle triplet, since the mutant Zif268 loses all specificity while F2-Arg 
registers no change in specificity. However, note that two stabilising interactions are 
abolished: an intramolecular buttressing interaction with Arg-1 on finger 3 and also the 
intermodular contact with the secondary DNA strand. An independent comparison of 
wild-type Zif268 binding to its consensus binding site flanked by G/T or A/C also found a 
20 five-fold reduction in affinity for those sites which are unable to satisfy a contact from 
Asp2 to the secondary DNA strand (Smirnoff, A. H. & Milbrandt, J. (1995) Mol. Cel. 
Biol. 15, 2275-2287). While the effects of perturbations in the DNA structure cannot be 
discounted in this case, the results of both experiments would seem to suggest that the 
reduction in binding affinity results from loss of the protein-DNA contact. Nevertheless, 
25 the intramolecular contact between positions -1 and 2 in a zinc finger, is a further level of 
synergy which may have to be taken into account before the full picture emerges, 
describing the possible networks of contacts which occur at the protein-DNA interface in 
the region of the overlapping subsites. 
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(iv) ANTI -SENSE: NO 



(ix) FEATURE: 

(A) NAME/KEY: CDS 
(E) LOCATION : 1 . .264 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

GCA GAA GAG AAG CCT TTT CAG TGT CGA ATC TGC ATG CGT AAC TTC AC-C 

Ala Glu Glu Lys Pro Phe Gin Cys Arg lie Cys Met Arg Asn Phe Ser 
1 5 



10 15 



GAT CGT AGT AGT CTT ACC CGC CAC ACG AGG ACC CAC ACA GGC GAG AAG 
Asp Arg Ser Ser Leu Thr Arg His Thr Arg Thr His Thr Gly Glu Lys 



20 



25 30 



CCT TTT CAG TGT CGA ATC TGC ATG CGT AAC TTC AGC AGG AGC GAT AAC 
Pro Phe Gin Cys Arg lie Cys Met Arg Asn Phe Ser Arg Ser Asp Asn 
35 40 45 

CTT ACG AGA CAC CTA AGG ACC CAC ACA GGC GAG AAG CCT TTT CAG TGT 
Leu Thr Arc His Leu Arg Thr His Thr Gly Glu Lys Pro Phe Gin Cys 
50 55 60 

CGA ATC TGC ATG CGT AAC TTC AGG CAA GCT GAT CAT CTT CAA GAG CAC 
Arg He Cvs Met Arg Asn Phe Arg Gin Ala As? His Leu Gin Glu Kis 

70 75 80 



48 



96 



144 



192 



240 



CTA AAG ACC CAC ACA GGC GAG AAG 
Leu Lys Thr Kis Thr Gly Glu Lys 
85 



264 
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Claims: 



1. A method for preparing a nucleic acid binding protein of the Cys2-His2 zinc finger 
• class capable of binding to a nucleic acid quadruplet in a target nucleic acid sequence, 
5 wherein binding to base 4 of the quadruplet by an a-helical zinc finger nucleic acid binding 

motif in the protein is determined as follows: 

a) if base 4 in the quadruplet is A. then position +6 in the a-helix is Gin and position 



10 



-4- +2 is not Asp; 

b) if base 4 in the quaorupfci is C, then position -6 in the a-helix may be any residue, as 
long as position + +2 in the a-helix is not Asp. 

2. A method according to claim 1. wherein binding to base 4 of the quadruplet by an 
a-helical zinc finger nucleic acid binding motif in the protein is additionally determined as 

15 follows: 

c) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or position +6 
is Ser or Thr and position + +2 is Asp; 

d) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr and 

20 position ++2 is Asp. 

3. A method for preparing a nucleic acid binding protein of the Cys2-His2 zinc finger 
class capable of binding to a nucleic acid quadruplet in a target nucleic acid sequence, 
wherein binding to each base of the quadruplet by an a-helical zinc finger nucleic acid 
25 binding motif in the protein is determined as follows: . 

a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or position +6 
is Ser or Thr and position + +2 is Asp; 

b) if base 4 in the quadruplet is A. then position +6 in the a-helix is Gin and' position 

30 -!- +2 is not Asp; 
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7. A method according to any one of claims 4 to 6 wherein X b is T or I. 

8. A method according to any one of claims 4 to 7 wherein X 3 . 3 is G-K-A, G-K-C, G- 
5 K-S, G-K-G, M-R-N or M-R. 

9. A method according to any one of claims 4 to 8 wherein the linker is T-G-E-K or T- 
G-E-K-P. 

10 10. A method according to any one of claims 4 to 9 wherein position + 9 is R or K. 

11. A method according to any one of claims 4 to 10 wherein positions -M, +5 and +8 
are not occupied by any one of the hydrophobic amino acids, F, W or Y. 

15 12. A method according to claim 11 wherein positions +1, +5 and +8 are occupied by 
the residues K, T and Q respectively. 

13. A method for preparing a nucleic acid binding protein of the Cys2-His2 zinc finger 
class capable of binding to a target nucleic acid sequence, comprising the steps of: 

20 

a) selecting a model zinc finger domain from the group consisting of naturally occurring 
zinc fingers and consensus zinc fingers; and 

b) mutating the finger according to the rules set in any one of claims 1 to 3. 

25 

14. A method according to claim 13, wherein the model zinc fmger is a consensus zinc 
finger whose structure is selected from the group consisting of the consensus structure P Y 
KCPECGKSFSQKSDLVKHQRTHTG, and the consensus structure P Y K 
CSECGKAFSQKSNLTRHQRIHTGEKP. 
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22. A method according to ciaim 21, comprising the steps of: 

a) preparing a nucleic acid construct capable of expressing a fusion protein comprising the 
5 nucleic acid binding protein and a' minor coat protein of a filamentous bacteriophage; 

b) preparing further nucleic acid constructs capable of expressing a fusion protein 
comprising a selectively mutated nucleic acid binding protein and a minor coat protein of 
a filamentous bacteriophage; 

c) causing the fusion proteins defined in steps (a) and (b) to be expressed on the surface of 
10 bacteriophage transformed with the nucleic acid constructs: 

d) assaying the ability of the bacteriophage to bind the target nucleic acid sequence and 
selecting the bacteriophage demonstrating superior binding characteristics. 

23. A method according to any one of claims 20 to 22 wherein the nucleic acid binding 
15 protein is selectively randomised at any one of positions +1, +5, +8, -1, +2, 4-3 or +6. 

24. A method for determining the presence of a target nucleic acid molecule, 
comprising the steps of: 

20 a) preparing a nucleic acid binding protein by the method of any preceding claim which is 

specific for the target nucleic acid molecule; 
b) exposing a test system comprising the target nucleic acid molecule to the nucleic acid 

binding protein under conditions which promote binding, and removing any nucleic acid 

binding protein which remains unbound: 
25 c) detecting the presence of the nucleic acid binding protein in the test system. 

25. A method according to claim 24, wherein the presence of the nucleic acid binding 
protein in the test system is detected by means of an antibody. 

30 26. A method according to claim 24 or claim 25 wherein the nucleic acid binding 
protein, in use. is displayed on the surface of a filamentous bacteriophage and the presence 
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